High-precision Monte Carlo study of the three-dimensional XY model on GPU 
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We perform large-scale Monte Carlo simulations of the classical XY model on a three-dimensional 
L x L x L cubic lattice using the graphics processing unit (GPU). By the combination of Metropolis 
single-spin flip, over-relaxation and parallel-tempering methods, we simulate systems up to L — 160. 
Performing the finite-size scaling analysis, we obtain estimates of the critical exponents for the three- 
dimensional XY universality class: a — —0.01293(48) and v — 0.67098(16). Our estimate for the 
correlation-length exponent v, in contrast to previous theoretical estimates, agrees with the most 
recent experimental estimate f cxp = 0.6709(1) at the superfluid transition of 4 He in a microgravity 
environment. 



I. INTRODUCTION 

One of the most beautiful ideas in physics is the 
renormalization-group (RG) theory^ which states that 
near a critical point, the nature of the phase transition 
can be described by a few universal properties. These 
universal properties depend only upon the spatial dimen- 
sionality, and the symmetry of the order parameter, re- 
gardless of the microscopic details of the system. This 
indicates that the phase transitions can be classified into 
different universality classes, and the asymptotic critical 
behaviors of each class are described by a set of criti- 
cal exponents and scaling functions. Among these, the 
three-dimensional (3D) XY or 0(2) universality class is 
the most extensively studied due to its relevance to the 
nature of the phase transitions in several physical sys- 
tems, such as the A-transition in 4 He. Experimentally, 
this superfluid transition permits the most accurate mea- 
surements of the critical exponents up to date in the 
micro-gravity environment-- The most recent value of the 
correlation-length exponent is f cxp = 0.6709(1), which is 
derived from the measured value of the specific-heat ex- 
ponent a via the hyperscaling relation^ 

The XY universality class has been studied by var- 
ious theoretical approaches: analytical field-theoretical 
methods^, and numerical methods such as high- 
temperature (HT) expansions^ and Monte Carlo (MC) 
simulations MC simulations combined with the 
finite-size scaling (FSS) technique^ have long been used 
to estimate the critical exponents of phase transitions. 
With smaller system sizes, deviation from the universal 
behavior due to the irrelevant scaling operators can be 
the source of systematic errors in the FSS analyses, and 
corrections to scaling become necessary. It has been pro- 
posed that the corrections to scaling can be minimized 
by simulating the 3D two-component 4 model on a sim- 
ple cubic lattice, which belongs to the 3D XY univer- 
sality class, with a proper choice of a parameter in the 
model.- ,8 However, the effect of sub-leading terms can 
only be partially suppressed due to the limited resolu- 
tion in the tuning parameter. On the other hand, by 
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FIG. 1. (Color online) Results for v as a function of time. 
The circles show the experimental values, the upper triangle 
depicts the field-theoretical calculations, the squares show the 
Monte Carlo results, and the filled squares are the results of 
this work. 



including sub-leading corrections in the fits, it is argued 
that it is possible to obtain more precise estimates of the 
critical exponents^ The history of recent results for the 
correlation-length exponent v is given in Fig. Q] 

Including the sub-leading correction brings complica- 
tion since it is necessary to perform a high-dimensional fit 
to a non-polynomial function, which might be sensitive 
to the numerical instability; therefore, a direct simulation 
of larger system sizes is desirable. Large-size simulations 
are crucial in developing a clear signature of criticality, 
and for accurate determination of the critical properties. 
In recent years, the advance of general purpose comput- 
ing on the graphics processing units (GPUs) makes it 
possible to perform large-scale simulations in a massively 
parallel scheme. 12 In this paper, we present our MC sim- 
ulations of the XY model on an L x L x L cubic lattice up 
to L = 160 on GPU. Using data obtained from large-size 
systems, we are able to obtain the critical exponents with 
higher precision than previously achieved. This paper is 
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organized as follows. In Sec. [TH we briefly discuss our 
simulation and analysis methods. Results of the simula- 
tion and a comparison with other works are presented in 
Sec. IIIII Finally, we conclude in Sec. |lVl 
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II. 



MODEL AND METHOD 



We simulate the classical XY model of the unit-length 
vectors = (cos0j,sin0j) on an L x L x L cubic lattice 
with the Hamiltonian, 



u 



(1) 



where indicates the nearest neighbor. The periodic 
boundary condition is applied. In the following, we set 
J = k B = l. 

We implement the GPU version of the Monte Carlo 
simulation based on the NVIDIA CUDA framework^ 
We refer interested readers to available literature for an 
introduction to the details of the GPU hardware and 
the programming models^ We implement three update 
schemes for the GPU version of the Monte Carlo simula- 
tion: parallel Metropolis single-spin flip, over-relaxation, 
and parallel-tempering methods. One Monte Carlo step 
(MCS) is defined as one Metropolis sweep and one over- 
relaxation sweep of the entire lattice, followed by one 
parallel-tempering exchange. To implement the paral- 
lel Metropolis and over-relaxation updates suitable for 
the GPU, we divide the entire lattice into blocks of 
16 x 16 x 16 = 4096 spins. Each block is decomposed 
into sub-blocks which belong to two different sub-lattices 
(Fig. [2j. Each block is assigned to a thread bloch^ con- 
taining 8x8x8 = 512 threads, which execute the same 
GPU kernel in parallel^ Each thread is responsible for 
updating 2x2x2 = 8 spins, with four black sites and four 
white sites, so that there are enough arithmetic opera- 
tions to hide the latency of the global memory accesses M. 
We apply the checkerboard decomposition algorithm to 
perform the Metropolis single-spin flips in paralleL 14 i 15 
We first update all the black spins in parallel via a GPU 
kernel. After all the black spins belonging to different 
blocks are updated, another kernel is launched to update 
all the white spins. Due to the special architecture of 
the GPU, the commonly used Mersenne- Twister (MT) 
random number generator (RNG) can not be efficiently 
implemented at the thread level. Instead, we use a faster 
RNG implementation specially designed for the GPU ar- 
chitecture, the Warp Generator^ We note that although 
it has a smaller period of 2 1024 - 1 than MT (2 19937 - 1), 
we do not find any noticeable statistical bias compared 
with the CPU runs using MT. 

It is well established that the single-spin flip Metropolis 
update suffers from the critical slowing down near the 
critical point and one has to resort to cluster updates i 17 i 18 
However, implementation of the cluster update on GPU 
is complicated and less efficient J£ We instead implement 
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FIG. 2. (Color online) Checkerboard decomposition in 3D. 
The full lattice is decomposed into blocks containing 16 x 16 x 
16 = 4096 spins each. Each block is assigned to a thread block 
containing 8x8x8 = 512 threads. Each thread manipulate 
2x2x2 = 8 spins, four black sites and four white sites. 



the microcanonical over-relaxation update^^ The new 
value of the spin on site % is obtained by reflecting the 



spin at its local molecular field 
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This update maps the system from a point in the phase 
space to another point with exactly the same energy. Af- 
ter several sweeps, the system is able to explore a larger 
region of the phase space without being stuck in a partic- 
ular local minimum, and the simulation becomes ergodic. 

To better equilibrate the simulation, we also perform 
the parallel-tempering (PT) Monte Carlo.^i In the PT 
scheme, many replicas at different temperatures are sim- 
ulated simultaneously. After a certain number of MCSs, 
we swap two adjacent configurations X m , X n at neighbor- 
ing temperatures T m , T n with the acceptance probability 
of 



W(X m ,T m \X ni T n ) =min 



l,e 



(l/T m -l/T„)(B m -B n ) 



(3) 

where E n is the total energy of replica n. 

We measure the following quantities during the simula- 
tion: magnetization to, Binder ratio Ul and spin stiffness 
p s with the following estimators, 



(4) 
(5) 
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where fi — x,y and z, and is the unit vector connecting 
nearest-neighbor sites i and j. 

To reduce the amount of data transfer between the 
CPU and the GPU, we store all the spin configurations 
at different temperatures in the GPU global memory, and 
all updates are performed through the kernel functions on 
the GPU. Measurements are also performed on the GPU 
and the results are sent back to the CPU for binning. 
Simulations are carried out at 34 temperatures ranging 
from T = 2.1 to T = 2.3 for L = 64, 80, 96, and 128 and 
at 13 temperatures ranging from T = 2.19 to T = 2.21 
for L = 160. The temperature set is chosen such that 
the acceptance rate of swaps is independent of the tem- 
peratures. After 3 x 10 6 MCSs for equilibrium, 1.3 x 10 7 
measurements are made for L — 64, 80, and 96 and 2 x 10 7 
measurements are made for L = 128 and 160. The data 
are blocked into several bins, each consisting of 10 5 mea- 
surements, for further analyses. Error bars are given by 
one standard deviation. The simulations were performed 
on Nvidia Tesla M2090, and took approximately 110 days 
of GPU time in total to accumulate the whole data set. 



III. RESULTS 

We perform the finite-size scaling analyses to extract 
the critical behaviors in the thermodynamic limiti 11 ! 22 
The singular part of the free energy with critical exponent 
k in zero-field can be describe by the scaling ansatz 

F(t,L) = L K l v J*{tL x l v ), (7) 

where v is the correlation- length critical exponent, t = 
(T — T c )/T c is the reduced temperature , and J 70 (a;) is 
an universal function which is analytic as x — > 0. 

We first use the spin stiffness p s and the Binder cu- 
mulant Ul to estimate T c . The spin stiffness p s scales 
as 

p. = L- x H°{tlM v ) (8) 

for the 3D XY model. This indicates that p s L at different 
sizes should intersect at T c (Fig. [3]). Also, the Binder cu- 
mulant Ul for different sizes also intersect at T c (Fig. [4j . 
In both cases, we obtain the crossing of the curves at a 
consistent T c = 2.2019. 

Further refinement of the analysis is carried out by 
data collapse. We first perform data collapse on the mag- 
netization m (Fig. [5]) to obtain T c , f3 and v. Near T Cl m 
has the scaling form 

m(t,L) =L-^ v M°(tL 1 ^), (9) 

where A4° is an universal function. We use bootstrap 
resampling technique^ to decorrelate the data obtained 
by the PT at different temperatures. In each resampling, 
1000 values are randomly chosen from the 130 (or 200) 
measurements and then take the average. The resam- 
pling is repeated for 130 (or 200) times to generate a new 
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FIG. 3. (Color online) p 3 L vs temperature for L — 64, 80, 96, 
128 and 160. The error bars are smaller than the symbols. 
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FIG. 4. (Color online) Ul vs temperature for L = 64, 80, 96, 
128 and 160. The error bars are smaller than the symbols. 



data set. This data set is used to perform data collapse 
to obtain T c , (3 and v. Temperatures between 2.1990 and 
2.2050 are used for data collapse. A fifth-order polyno- 
mial is used to fit the scaling function AA°, and the phase 
space of T c , (3 and v is scanned for the best collapse, 
where the reduced chi-square x rc( j = x/d.o.f. approaches 
one. The procedure is repeated 100 times to estimate 
the error bars of T c , j3 and v (Fig. [5]). Our estimates are 
T c = 2.201852(1), = 0.34910(12) and v = 0.67138(11) 
with the average x 2 od « 1.2938. We use the hyperscaling 
relation 2 ^ 

2-a = 3^ = 2/3 + 7, (10) 

to obtain the estimates for a = —0.01414(33) and 7 = 
1.31594(41). 
We also perform data collapse on (72 defined as, 

Q 2 (t, L) = 3(1 - U L ) = ^% = q 2 (tL^)- (H) 
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TABLE I. Comparison of the critical exponents determined via various methods for three-dimensional XY universality class. 
The quantities with asterisk are calculated using the scaling relation 7 = (2 — rj)v or the hyperscaling relation [Eq. (|l(JJl ]. and 
errors are calculated by treating variables as independent. 
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MC 


this work 
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[10] (2006) 


-0.0151(3)* 
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[9] (2006) 


-0.0151(9)* 






0.6717(3) 
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[5J (2001) 


-0.0146(8)* 


0.3485(3)* 
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[5] (2001) 
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0.3485(2)* 
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[4] (2001) 


-0.0112(21) 


0.3474(11)* 


1.3164(8) 


0.6704(7) 


MC" 


[8] (1999) 


-0.0169(33)* 


0.349(2)* 


1.3190(24) 


0.6723(11) 


PRG^ 


[7] (1993) 


-0.014(21)* 




1.307(14)* 


0.662(7) 


MC ~ 


[7] (1993) 






1.324(1) 




exp^ 


[2J (2003) 


-0.0127(3) 






0.6709(1) 


exp 


[2J (1996) 


-0.01056(38) 






0.67019(13) 
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b two-component 4 field theory 

c phenomenological RG 
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FIG. 5. (Color online) Scaling of m and 52 for L=64 (black), 
80 (red), 96 (green), 128 (blue) and 160 (orange). The dashed 
lines correspond to polynomial fits to the corresponding scal- 
ing functions. 



The same procedure of resampling is applied as above. 
Data at temperatures between 2.1990 and 2.2040 are used 
for data collapse. A fourth-order polynomial is used to 
fit Q2. The phase space of T c and v is scanned to produce 
the best collapse. The procedure is repeated 100 times to 
estimate the error bars for T c and v. Figure [5] also shows 
the scaling plot of q 2 for L = 64, 80, 96, 128 and 160. Our 
estimates give T c = 2.2018312(6), v = 0.67098(16) with 
average Xred ~ 1-2458. Using the hyperscaling relation 
Eq. (dHl), we obtain the estimates for a = -0.01293(48). 

In Table HI we compare our estimates with recent re- 
sults for the critical exponents of the 3D XY universal- 
ity class. Our estimates of v are consistent with each 
other within two standard deviations (Fig. [T]), and the 
error bars are comparable with the experimental preci- 



sion. However, they are smaller than previous theoretical 
estimates for is, 9 ' 10 and, contrary to previous claims, are 
consistent with the experimental estimate^. This might 
be attributed to the largest system sizes (L = 128, 160) 
with high statistics that we can simulate, and a better 
finite-size scaling analysis can be performed. 



IV. CONCLUSION 

We perform large-scale Monte Carlo calculations of the 
3D XY model. Implementation of efficient, highly paral- 
lel Monte Carlo update schemes on GPU enables us to 
perform simulations on lattices up to L = 160. With 
larger system sizes, we are able to perform finite-size 
scaling and obtain a five-digit accuracy of the critical 
exponents in a significantly less amount of computation 
time. With the current accuracy for v and a in our sim- 
ulations, contrary to previous theoretical studies, our re- 
sults suggest that the theoretical estimates of the critical 
exponents for the 3D XY universality class are consis- 
tent with experimental results within two standard de- 
viations. This suggest that the A-transition in He 4 can 
indeed be accurately described by the 3D XY universal- 
ity class. It would be interesting to revisit the models 
studied previously to further confirm results obtained in 
this paper. 
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